10th World Congress in Probability and Statistics

Organized Contributed Session (live Q&A at Track 2, 9:30PM KST)

Organized 12

Recent Developments for Dependent Data (Organizer: Mikyoung Jun)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 20 Tue, 8:30 AM — 9:00 AM EDT

DeepKriging: spatially dependent deep neural networks for spatial prediction

Ying Sun (King Abdullah University of Science and Technology (KAUST))

6
In spatial statistics, a common objective is to predict the values of a spatial process at unobserved locations by exploiting spatial dependence. In geostatistics, Kriging provides the best linear unbiased predictor using covariance functions and is often associated with Gaussian processes. However, when considering non-linear prediction for non-Gaussian and categorical data, the Kriging prediction is not necessarily optimal, and the associated variance is often overly optimistic. We propose to use deep neural networks (DNNs) for spatial prediction. Although DNNs are widely used for general classification and prediction, they have not been studied thoroughly for data with spatial dependence. In this work, we propose a novel neural network structure for spatial prediction by adding an embedding layer of spatial coordinates with basis functions. We show in theory that the proposed DeepKriging method has multiple advantages over Kriging and classical DNNs only with spatial coordinates as features. We also provide density prediction for uncertainty quantification without any distributional assumption and apply the method to PM2.5 concentrations across the continental United States.

A model-free subsampling method based on minimum energy criterion

Wenlin Dai (Renmin University of China)

3
The extraordinary amounts of data generated in science today pose heavy demands on computational resources and time, which hinders the implementation of various statistical methods. An efficient and popular strategy of downsizing data volumes and hence alleviating these challenges is subsampling. However, the existing methods either rely on specific assumptions for the underlying models or acquire only partial information from the available data. We propose a novel approach, termed adaptive subsampling, that is based on the minimum energy criterion (ASMEC). The proposed method requires no explicit model assumptions and `smartly' incorporates information on covariates and responses. ASMEC subsamples possess two desirable properties: space-filling and spatial adaptiveness to the full data. We investigate the theoretical properties of the ASMEC estimator under the smoothing spline regression model and show that it converges at an identical rate to two recently proposed basis selection methods. The effectiveness and robustness of the ASMEC approach are also supported by a variety of simulated examples and two real-life examples.

Global wind modeling with transformed Gaussian processes

Jaehong Jeong (Hanyang University)

4
Uncertainty quantification of wind energy potential from climate models can be limited because it requires considerable computational resources and is time-consuming. We propose a stochastic generator that aims at reproducing the data-generating mechanism of climate ensembles for global annual, monthly, and daily wind data. Inferences based on a multi-step conditional likelihood approach are achieved by balancing memory storage and distributed computation for a large data set. In the end, we discuss a general framework for modeling non-Gaussian multivariate stochastic processes by transforming underlying multivariate Gaussian processes.

Threshold estimation for continuous three-phase polynomial regression models with constant mean in the middle regime

Chih-Hao Chang (National University of Kaohsiung)

2
This talk considers a continuous three-phase polynomial regression model with two threshold points for dependent data with heteroscedasticity. We assume the model is polynomial of order zero in the middle regime, and is polynomial of higher orders elsewhere. We denote this model by M2, which includes models with one or no threshold points, denoted by M1 and M0, respectively, as special cases. We provide an ordered iterative least squares (OiLS) method when estimating M2 and establish the consistency of the OiLS estimators under mild conditions. We also apply a model-selection procedure for selecting Mk; k=0,1,2. When the underlying model exists, we establish the selection consistency under the aforementioned conditions. Finally, we conduct simulation experiments to demonstrate the finite-sample performance of our asymptotic results.

Q&A for Organized Contributed Session 12

0
This talk does not have an abstract.

Session Chair

Mikyoung Jun (University of Houston)

Organized 16

Non-Euclidean Statistical Inference (Organizer: Young Kyung Lee)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 20 Tue, 8:30 AM — 9:00 AM EDT

Functional linear regression model with randomly censored data: predicting conversion time to Alzheimer's disease

Seong Jun Yang (Jeonbuk National University)

3
Predicting the onset time of Alzheimer's disease is of great importance in preventive medicine. Structural changes in brain regions have been actively investigated in the association study of Alzheimer's disease diagnosis and prognosis. In this study, we propose a functional linear regression model to predict the conversion time to Alzheimer's disease among mild cognitive impairment patients. Vertical thickness change in corpus callosum is measured from magnetic resonance imaging scan and put into the model as a functional covariate. A synthetic response approach is taken to deal with the censored data. The simulation studies demonstrate that the proposed model successfully predicts the unobserved true survival time but indicate that high censoring rate may cause poor prediction in time. Through ADNI data application, we find that the atrophy in the rear area of corpus callosum is a possible neuroimaging marker on Alzheimer's disease prognosis.

Deconvolution estimation on hyperspheres

Jeong Min Jeon (Katholieke Universiteit Leuven)

7
This paper considers nonparametric estimation with contaminated data observed on the unit hypersphere $S^d$. For such data, we consider deconvolution density estimation and regression analysis. Our methodology and theory are based on harmonic analysis on $S^d$ which is largely unknown in statistics. We establish novel deconvolution density and regression estimators, and study their asymptotic properties including the rates of convergence and asymptotic distributions. We also provide asymptotic confidence intervals. We present practical details on implementation as well as the results of numerical studies.

Confidence band for persistent homology of KDEs

Jisu Kim (Inria)

3
The persistent homology of the upper level sets of a probability density function quantifies the salient topological features of data. Such a target quantity can be well estimated using the persistent homology of the upper level sets of a KDE(kernel density estimator). In this talk, I will present how the confidence band can be computed for determining the significance of the topological features in the persistent homology of KDEs, based on the bootstrap procedure. First, I will present how the confidence band can be computed for the persistent homology of KDEs computed on a grid. In practice, however, computing the persistent homology on a grid is infeasible when the dimension of the ambient space is high or topological features are in different scales. Hence, I will consider the persistent homology of KDEs on Vietoris-Rips complexes over the sample point. I will describe how to construct a valid confidence band for the persistent homology of KDEs on Vietoris-Rips complexes based on the bootstrap procedure.

Analysis of chemical-gene bipartite network via a user-based collaborative filtering method incorporating chemical structure information

Namgil Lee (Kangwon National University)

2
Drug repositioning refers to finding new applications and different uses of known drugs. In this study, we introduce a network analysis approach for drug repositioning. In particular, we introduce a user-based collaborative filtering method for analyzing bipartite networks between chemicals and genes. Moreover, under the assumption that structural similarity between chemicals is deeply related to functional similarity, an improved measure of similarity between chemicals is proposed. Numerical experiments are conducted to evaluate the statistical significance of the proposed method for the CTD database.

Q&A for Organized Contributed Session 16

0
This talk does not have an abstract.

Session Chair

Young Kyung Lee (Kangwon National University)

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.